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Abstract 

Objective To determine the completeness and diagnostic validity of 
myocardial infarction recording across four national health record sources 
in primary care, hospital care, a disease registry, and mortality register. 

Design Cohort study. 

Participants 21 482 patients with acute myocardial infarction in England 
between January 2003 and March 2009, identified in four prospectively 
collected, linked electronic health record sources: Clinical Practice 
Research Datalink (primary care data), Hospital Episode Statistics 
(hospital admissions), the disease registry MINAP (Myocardial Ischaemia 
National Audit Project), and the Office for National Statistics mortality 
register (cause specific mortality data). 



Setting One country (England) with one health system (the National 
Health Service). 

Main outcome measures Recording of acute myocardial infarction, 
incidence, all cause mortality within one year of acute myocardial 
infarction, and diagnostic validity of acute myocardial infarction compared 
with electrocardiographic and troponin findings in the disease registry 
(gold standard). 

Results Risk factors and non-cardiovascular coexisting conditions were 
similar across patients identified in primary care, hospital admission, 
and registry sources. Immediate all cause mortality was highest among 
patients with acute myocardial infarction recorded in primary care, which 
(unlike hospital admission and disease registry sources) included patients 
who did not reach hospital, but at one year mortality rates in cohorts 
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from each source were similar. 5561 (31 .0%) patients with non-fatal 
acute myocardial infarction were recorded in all three sources and 1 1 
482 (63.9%) in at least two sources. The crude incidence of acute 
myocardial infarction was underestimated by 25-50% using one source 
compared with using all three sources. Compared with acute myocardial 
infarction defined in the disease registry, the positive predictive value of 
acute myocardial infarction recorded in primary care was 92.2% (95% 
confidence interval 91 .6% to 92.8%) and in hospital admissions was 
91 .5% (90.8% to 92.1%). 

Conclusion Each data source missed a substantial proportion (25-50%) 
of myocardial infarction events. Failure to use linked electronic health 
records from primary care, hospital care, disease registry, and death 
certificates may lead to biased estimates of the incidence and outcome 
of myocardial infarction. 

Trial registration NCT01569139 clinicaltrials.gov. 

Introduction 

Electronic health records inform patient decision making and 
policy and are increasingly used to define disease and the 
outcomes of care in observational cohorts of genetic and 
environmental factors'" 4 and randomised trials. 5 " 1 Recent 
initiatives to expand the use of health records for research have 
been announced in many countries, 8 "" and in the United 
Kingdom the National Health Service is now legally required 
to evaluate patient outcomes. 12 The UK government has also 
recently announced plans to drive improvement in 
cardiovascular disease care through use of information in linked 
health records. 13 In various settings across the world these 
initiatives are being met by the linkage of electronic health 
records from disparate sources. Underpinning these uses of 
electronic health records is the need for a better understanding 
of the quality of data within a single source as well as between 
multiple sources. Indeed, it is a concern that electronic records 
from one part of the health system, such as primary care, may 
not capture health events occurring in other parts of the health 
system, such as hospital care. 

As part of the CArdiovascular disease research using Linked 
Bespoke studies and Electronic health Records (CALIBER) 
programme 14 we carried out new linkage between records from 
primary care, 15 hospitals, an acute coronary syndrome registry, 16 
and death certificates. Although data from these types of source 
are increasingly available in different countries, 17 18 for acute 
myocardial infarction the overlap between these four electronic 
health record sources, patient risk factors, and subsequent 
mortality have not been compared. Previous cross referencing 
studies have typically compared one or two electronic sources, 
such as coded hospital discharge diagnoses and cause of death, 
with case note review, 19 " 21 questionnaires to general 
practitioners, 22 or active case finding in a prospective consented 
study 1 " 24 (see supplementary table 1). Linkages with the national, 
ongoing acute coronary syndrome registry allowed detailed 
diagnoses of myocardial infarction (with coded 
electrocardiographic findings and markers of myocardial 
necrosis, not available in other sources) to be compared with 
diagnoses in primary care and hospital admissions. Linkages 
with primary care allowed evaluation of risk factors in patients 
with a record of acute myocardial infarction in any source. 
Linkages with the death record allowed evaluation of cause 
specific mortality of myocardial infarction recorded in any 
source, including among cases not admitted to hospital. 

We compared the incidence, recording, agreement of dates and 
codes, risk factors, and all cause mortality of acute myocardial 
infarction recorded in primary care, hospital care, the national 



acute coronary syndrome registry, and the national death 
registry. 

Methods 

We used a cohort study design, identifying patients with acute 
myocardial infarction in four prospectively collected, linked 
electronic health record sources in England (the CALIBER 
programme 14 ). Briefly, the CALIBER linkage included 
anonymised primary care electronic patient records from the 
Clinical Practice Research Datalink 15 (www.cprd.com, formerly 
known as the General Practice Research Database), data on 
hospital admissions from Hospital Episode Statistics, the 
national registry of acute coronary syndromes (Myocardial 
Ischaemia National Audit Project, MINAP), 16 and the death 
registry, curated by the Office for National Statistics (see 
supplementary table 2). 

Of the 630 primary care practices in Clinical Practice Research 
Datalink, 244 consented to data linkage with Hospital Episode 
Statistics, MINAP, and the Office for National Statistics. These 
practices contained 3.9% of the population of England in 2006. 
The linkage was carried out in October 2010 by a trusted third 
party, using a deterministic match between NHS number, date 
of birth, and sex. Overall, 96% of patients with a valid NHS 
number were successfully matched. 

Study population: patients with acute 
myocardial infarction 

We identified records of acute myocardial infarction with 
reference to previously described definitions for each source. 
In primary care, diagnoses are recorded using Read codes 25 and 
previous studies have published lists of Read codes used to 
identify acute myocardial infarction. 26 27 We identified 
myocardial infarction using the 62 Read codes listed in 
supplementary table 3. In Hospital Episode Statistics and the 
Office for National Statistics death registry, diagnoses are coded 
using the International Classification of Diseases, 10th revision 28 ; 
in common with previous studies, 2 ' we defined acute myocardial 
infarction by ICD-10 codes 121 (acute myocardial infarction), 
122 (subsequent myocardial infarction), or 123 (current 
complications following acute myocardial infarction). In 
Hospital Episode Statistics, to be included in our study 
myocardial infarction had to be recorded as the primary 
diagnosis in the first episode of an admission to hospital (where 
the first episode refers to the first period of care for an admitted 
patient overseen by a healthcare professional 30 ). We performed 
a sensitivity analysis to assess the influence of inclusion of 
secondary diagnoses. In MINAP, ST elevation and non-ST 
elevation myocardial infarction were identified using hospital 
discharge diagnosis, markers of myocardial necrosis, and coded 
electrocardiographic findings, in accordance with the 
internationally agreed definition of myocardial infarction. 31 For 
MINAP and Hospital Episode Statistics, we took the hospital 
admission date to represent the date of acute myocardial 
infarction. 

The study period was 1 January 2003 to 31 March 2009 (when 
all record sources were concurrent) and confined to patients 
who had been registered with their general practice for at least 
a year and the practice had been submitting data for at least one 
year that met Clinical Practice Research Datalink data quality 
standards for continuity and plausibility of data recording. In 
the main analysis we included only patients with at least one 
record of admission to hospital in Hospital Episode Statistics 
at any time (for any cause) as these patients were shown to be 
linkable, but we conducted a sensitivity analysis including all 
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patients. We selected the first record of myocardial infarction 
during the patient's study period as the index event and 
considered myocardial infarction records in the other data 
sources as representing the same event if they were dated within 
30 days of the index event. 

Cardiovascular risk factors and 
non-cardiovascular coexisting conditions 

For patients with acute myocardial infarction we identified risk 
factors recorded in primary care, including age, sex, social 
deprivation, 12 smoking, use of antihypertensives or lipid lowering 
drugs, diabetes mellitus, Charlson comorbidity index, 31 and 
primary care consultation rate before the event. We used mean 
measures of systolic blood pressure and total cholesterol and 
high density lipoprotein cholesterol levels before myocardial 
infarction along with age, sex, and smoking status (where these 
variables were present) to estimate the 10 year Framingham risk 
for acute myocardial infarction or coronary death. 14 We used 
these measures to compare the cohorts of myocardial infarction 
identified in each data source. 

Follow-up for mortality 

We followed all patients with a record of myocardial infarction 
in any source for one year for death as recorded in the Office 
for National Statistics death registry. We categorised patients 
as having fatal or non-fatal myocardial infarction by whether 
they died of any cause within seven days of the myocardial 
infarction. If a patient had a myocardial infarction record in 
Clinical Practice Research Datalink, Hospital Episode Statistics, 
or MINAP after their date of death, we considered that they 
died on the day of their myocardial infarction. 

Agreement in recording 

If the time difference between the earliest date of acute 
myocardial infarction in one source and the date in another 
source was no more than 30 days we considered that the records 
of acute myocardial infarction in the different sources agreed. 
A myocardial infarction recorded more than 30 days after the 
earliest date was considered a new event and was not included, 
ensuring that each patient appeared only once in the analysis. 
We chose 30 days to account for any delay in recording of 
myocardial infarction in primary care, assuming that any record 
within 30 days of a hospital admission was likely to represent 
the same event and anything after 30 days could feasibly be a 
subsequent myocardial infarction. We carried out a sensitivity 
analysis using a 90 day threshold. 

Statistical analysis 

Incidence 

We estimated population based incidence rates of fatal and 
non-fatal acute myocardial infarction using the denominator of 
all adults in the CALIBER primary care population aged 1 8 and 
over (2.2 million), followed up for a mean 4.1 years between 
2003 and 2009. We used each of the data sources separately 
and together to identify incident myocardial infarction, ending 
the follow-up period for a patient on the date of their first 
myocardial infarction during the study period, death, or 
deregistration from the general practice. 



Cardiovascular risk factors and 
non-cardiovascular coexisting conditions 

We compared patients with fatal and non-fatal acute myocardial 
infarction identified in the four data sources for risk factors and 
coexisting conditions recorded in primary care. 

Death after acute myocardial infarction 

We produced cumulative incidence curves for coronary and 
non-coronary mortality for patients recorded in each data source 
and compared mortality using a Cox proportional hazards model 
adjusted for age and sex. 

Agreement in recording 

We would expect patients who survived seven days after 
myocardial infarction to be recorded in the primary care, disease 
registry, and hospital admissions sources, and we assessed 
agreement between these three sources in a Venn diagram. For 
patients who died within seven days, we examined the 
proportion recorded by each source but did not compare 
agreement across all four sources as we would not expect the 
hospital discharge data and disease registry to record patients 
who died before reaching hospital. 

In patients who did not have a record of acute myocardial 
infarction in one or more data sources, we looked for other codes 
that may have been used to describe the event. In the disease 
registry, we looked for unstable angina or admission diagnoses 
of any acute coronary syndrome. In primary care and hospital 
discharge data, we sought other acute coronary syndromes, 
coronary disease, chest pain, or other cardiac diagnoses (for 
example, atrial fibrillation, heart failure, cardiac arrest). In 
primary care data we also examined codes indicating contact 
with secondary care. Where none of these codes was recorded, 
we tabulated all recorded codes in the 30 days before and after 
the date of myocardial infarction to see if there were any relevant 
codes we had overlooked. 

We performed a logistic regression analysis to establish whether 
age, sex, deprivation, rate of primary care consultation, year of 
myocardial infarction, or mortality at 30 days explained 
suboptimal recording of acute myocardial infarction in primary 
care, hospital discharge, or disease registry sources. 

We calculated the positive predictive value of primary care or 
hospital discharge diagnoses of acute myocardial infarction 
among patients who also had a record in the acute coronary 
syndrome registry. Data were analysed using Stata 12 and R 
2.14.1. 15 

Results 

We identified 21 482 patients with fatal or non-fatal acute 
myocardial infarction recorded in any of the four data sources. 

Incidence 

Among the single source crude estimates for incidence of 
myocardial infarction, primary care data (Clinical Practice 
Research Datalink) gave the highest estimate, of 187 per 100 
000 patient years (95% confidence interval 184 to 190), followed 
by hospital discharge data (Hospital Episode Statistics) with 
154 per 100 000 patient years (152 to 157), acute coronary 
syndrome registry (MINAP) with 115 per 100 000 patient years 
(113 to 118), and death registry with 45 per 100 000 patient 
years (43 to 46). Combining these three sources yielded an 
estimate of 243 per 100 000 patient years (239 to 246, fig 1JJ). 
The crude incidence of acute myocardial infarction was 25% 
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lower using only Clinical Practice Research Datalink and 50% 
lower using only MINAP compared with using all three sources. 
See supplementary figure 1 and table 4 for standardised 
incidence by age, sex, and region. 

Cardiovascular risk factors and comorbidity 

Overall, the cohorts identified from the primary care, hospital, 
and disease registry sources had a similar prevalence of 
cardiovascular risk factors and comorbidities. However, 
compared with those recorded in the acute coronary syndrome 
registry or hospital discharge data only, patients with fatal or 
non-fatal myocardial infarction recorded only in primary care 
were on average two years younger and more likely to be current 
smokers and in the most deprived fifth (P<0.001 for these 
comparisons, also see supplementary table 5). Patients recorded 
by the death registry were older than patients recorded in the 
other sources and had a higher burden of risk factors reflecting 
their age. However, other demographic characteristics and 
cardiovascular risk factors were broadly similar across patients 
recorded in primary care and hospital care sources (table 1 II, 
also see supplementary table 5). 

Death after acute myocardial infarction 

Patients with myocardial infarction identified in the disease 
registry had lower crude 30 day mortality (10.8%, 95% 
confidence interval 10.2% to 11.4%) than those identified in 
hospital care (13.9%, 13.3% to 14.4%) or in primary care 
(14.9%, 14.4% to 15.5%, fig 2U). At one year, however, 
mortality was similar in all three groups, at around 20%. 

In the linked data, patients with acute myocardial infarction 
recorded in only one source had higher mortality than those 
recorded in more than one source (age and sex adjusted hazard 
ratio 2.29, 95% confidence interval 2.17 to 2.42; P<0.001). 
Among patients with myocardial infarction recorded in only 
one source (Hospital Episode Statistics, Clinical Practice 
Research Datalink, or MINAP), those recorded only in primary 
care had the highest mortality on the first day but the lowest 
mortality thereafter (see supplementary figures 2 and 3). Among 
patients with myocardial infarctions recorded in one of Hospital 
Episode Statistics or MINAP but not both, those in MINAP had 
lower coronary mortality in the first month (age and sex adjusted 
hazard ratio 0.33, 0.28 to 0.39, P<0.001) but similar mortality 
for non-coronary events (1.12, 0.90 to 1.40, P=0.3). After the 
first month, patients with myocardial infarctions recorded only 
in primary care had about half the hazard of mortality of patients 
with myocardial infarctions recorded in one of MINAP or 
Hospital Episode Statistics (hazard ratio adjusted for age and 
sex for coronary causes 0.49, 95% confidence interval 0.40 to 
0.60, P<0.001 and for other causes 0.57, 0.49 to 0.67, P<0.001). 
Of the 35 1 8 patients with myocardial infarction recorded in any 
of the four sources who died of any cause within seven days, 
54.4% (n=1914) had a myocardial infarction code recorded in 
primary care within 30 days. The underlying cause of death was 
acute myocardial infarction in 2924 patients (83.0%); a further 
164 patients (4.7%) had ischaemic heart disease recorded as the 
underlying cause of death (ICD-10 code 120, 124, or 125), 60 
(1.7%) had cerebrovascular disease (160-169), and 85 (2.4%) 
had respiratory disease (J00-J99). However, 3375 of these 35 18 
patients (95.9%) had a coronary diagnosis (120-125) as either 
the underlying cause or a secondary cause of death. 

Fatal myocardial infarctions identified by death registry data 
(underlying cause of death ICD-10 121, 122, or 123, n=2919) 
were unlikely to be recorded in hospital sources; 36.7% 
(n=1072) were recorded in Hospital Episode Statistics and just 



17.1% (n=498) in the MINAP disease registry within 30 days, 
but 55.9% (n=1631) were recorded in primary care (see 
supplementary table 6). 

Non-fatal acute myocardial infarction: 
agreement between record sources 

Among the 17 964 patients with at least one record of non-fatal 
acute myocardial infarction, 13 380 (74.5%) were recorded by 
Clinical Practice Research Datalink, 12 189 (67.9%) by Hospital 
Episodes Statistics, and 9438 (52.5%) by MINAP. Overall, 5561 
(31.0%) of patients had the event recorded in all three sources 
and 1 1 482 (63.9%) in at least two sources (fig 311). When we 
extended the recording window from 30 days to 90 days, the 
proportion recorded in all three sources increased only slightly, 
to 32.0% (n=5747). When we included patients who had never 
had a record of a hospital admission in Hospital Episode 
Statistics, the proportion of non-fatal myocardial infarctions 
recorded in all three sources decreased slightly to 30.0% 
(5561/1 8 536) and the proportion recorded only in primary care 
increased from 17.7% (3188/17 964) to 20.3% (3760/18 536). 
A sensitivity analysis in which the Hospital Episode Statistics 
case definition included secondary diagnoses of myocardial 
infarction (where myocardial infarction was not the reason for 
admission) produced only a slight increase in the proportion 
recorded in all three sources (5812/18 283, 32.0%), and 
identified 306 additional myocardial infarctions that were not 
in any other source. 

The exact date of admission agreed in over 80% of 685 1 patients 
with acute myocardial infarction recorded in both hospital care 
and disease registry sources (see supplementary figure 4), but 
the date recorded in primary care was the same as the disease 
registry admission date or hospital admission date for only 50% 
of patients (n=15 753). There was a smaller peak in primary 
care recording between five and seven days after admission. 
When the time window was extended to 90 days, there was little 
change in these proportions. 

Among patients with non-fatal myocardial infarction, 88.0% 
(8304/9438) of those recorded in MINAP and 89.1% (10 859/12 
189) recorded in Hospital Episode Statistics had a Read code 
for any cardiac diagnosis or chest pain within 30 days in primary 
care, and in over 70% the Read code stated myocardial infarction 
(see supplementary table 6). Only 25.1% (3364/13 380) of the 
non-fatal myocardial infarctions recorded in primary care stated 
the type — that is, ST elevation or non-ST elevation — compared 
with 100% for the disease registry. If a non-fatal myocardial 
infarction was recorded in primary care, hospital discharge data 
recorded a cardiac diagnosis within 30 days in 84.9% (11 355/13 
380) of patients, with a primary diagnosis of myocardial 
infarction in 72.6% (9720/13 380). However, this proportion 
varied depending on the Read term used to identify myocardial 
infarction in primary care; for terms that state the anatomical 
location (for example, acute anterolateral infarction) it was 
around 80% but was lower for less precise terms. For example, 
of the 74 patients with the Read term "heart attack," only 32 
(43%) had a primary hospital diagnosis of myocardial infarction. 
Supplementary tables 7-9 describe the agreement between 
sources according to the way in which acute myocardial 
infarction was recorded in each source. 

Positive predictive value 

For primary care or hospital discharge patients with an 
associated record in the disease registry (MINAP), the positive 
predictive value of the acute myocardial infarction diagnosis 
(the probability that the diagnosis recorded in the disease registry 
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was myocardial infarction rather than unstable angina or a 
non-cardiac diagnosis) was 92.2% (6660/7224, 95% confidence 
interval 9 1 .6% to 92.8%) in primary care and 91 .5% (685 1/7489, 
90.8% to 92.1%) in hospital care (table 2IJ). Eighty five percent 
of patients recorded in primary care and hospital discharge 
(7386/8707) had a record of raised cardiac markers and half 
(3766/8707) had a record of ST segment elevation on 
electrocardiography . 

Non-fatal acute myocardial infarction: reasons 
for disagreement 

Compared with patients who had a record of acute myocardial 
infarction in only one source, those with records in multiple 
sources had a lower rate of primary care consultation before the 
event, were younger, were more likely to be male, and more 
likely to have experienced acute myocardial infarction in one 
of the later years of data collection. Among patients with 
myocardial infarction recorded in primary care, an additional 
record in Hospital Episode Statistics or MINAP was strongly 
associated with increased mortality at 30 days (see 
supplementary table 10). 

Discussion 

We compared electronic health records on one major disease 
event — acute myocardial infarction — across four English, 
ongoing sources of health record data: primary care (Clinical 
Practice Research Datalink), hospital admissions (Hospital 
Episode Statistics), a quality improvement disease registry 
(Myocardial Ischaemia National Audit Project, MINAP), and 
the death registry (Office for National Statistics). In over 20 
000 patients each data source missed a substantial proportion 
of myocardial infarction events. We also found evidence for the 
validity of myocardial infarction recording across all sources, 
in terms of risk factor profiles and mortality at one year. Taken 
together, these findings support the wider use of linkage of 
multiple record sources by clinicians, policy makers, and 
researchers. 

Fatal myocardial infarction 

Both primary care and death registry data can be used to capture 
fatal myocardial infarction occurring out of hospital among 
people without a record of myocardial infarction in the Hospital 
Episode Statistics or disease registry (MINAP). The death 
registry is a useful source of fatal acute myocardial infarction 
for research, as most (83.0%) patients who were identified as 
having acute myocardial infarction in any of the data sources 
and died within seven days had myocardial infarction recorded 
as their underlying cause of death. These figures agree with 
results from the Oxford Record Linkage Study, where among 
5686 patients admitted to hospital with myocardial infarction 
85.2% who died within 30 days had myocardial infarction 
recorded as the underlying cause of death. 36 

Non-fatal myocardial infarction 

Primary care captures most cases, but all sources 
miss non-fatal myocardial infarction 

We found that each record source misses cases. Only one third 
of non-fatal myocardial infarctions were recorded in all three 
data sources (primary care, hospital admissions, and disease 
registry) and two thirds were recorded in at least two sources. 
Clinical Practice Research Datalink was the single most 
complete source of non-fatal myocardial infarction records (one 
quarter of all non-fatal myocardial infarction events not 



recorded), Hospital Episode Statistics missed one third, and 
MINAP missed nearly half (fig 3). This agrees with the results 
of other studies; a two source study of myocardial infarction in 
Scotland (see supplementary table 1) compared the incidence 
based on primary care records with that based on hospital data 
and showed that in combination they provided the highest 
estimates of incidence." Further two source comparisons in 
Australia, 38 Denmark, 39 and the Netherlands 24 (see supplementary 
table 1) have shown that hospital records alone underestimate 
the true incidence of myocardial infarction. Despite the low 
sensitivity of these data sources, in our study the positive 
predictive value of myocardial infarction records in primary 
care and hospital admission sources were over 90% compared 
with the disease registry gold standard based on the international 
definition of myocardial infarction (table 2). 

However, some of the myocardial infarctions recorded only in 
primary care are likely to be historical diagnoses because the 
associated mortality rate in the first month is much lower than 
those also recorded in hospital sources (see supplementary figure 
2). Our results using cross referencing of electronic health 
records in 20 000 patients are consistent with previous manual 
approaches to validation in primary care, which cross reference 
a few hundred patients with disease diagnoses recorded using 
Read codes against anonymised free text, death certificates, 
paper medical records, or hospital discharge summaries, or 
questionnaires to general practitioners. 4 "" 48 Our much larger 
sample size, however, allowed us to evaluate individual Read 
terms that are used to record myocardial infarction. This type 
of validation has not been done previously for myocardial 
infarction and may be relevant to other common conditions that 
can be recorded using a variety of codes, such as stroke. 49 

Hospital admission data 

To our knowledge, no studies have examined the positive 
predictive value of ICD-10 coded myocardial infarction 
diagnosis in hospital admission data against an ongoing disease 
registry. We found that 62.8% of non-fatal myocardial 
infarctions recorded in primary care and 72.6% recorded in the 
disease registry were recorded by hospital admissions data. This 
is consistent with a single electronic health record source 
(Hospital Episode Statistics ICD-10 121 and 122) capturing 53% 
of myocardial infarctions in an investigator led cohort with 
active follow-up. 5 " 

Disease registry and maximising true positives 

The strengths of the disease registry MINAP lie in the fact that 
its diagnostic records (troponin values, electrocardiographic 
findings, and cardiologist diagnosis of ST elevation and non-ST 
elevation myocardial infarction) are not available in other 
sources, which offer validated endpoints electronically from all 
hospitals in England and Wales. An acute myocardial infarction 
recorded in MINAP is thus an electronic health record gold 
standard, as a myocardial infarction recorded by a registry is 
likely to fulfil international diagnostic criteria. 16 The registry 
may be important for detecting endpoints in cohort studies and 
trials, where false positives can dilute any observed effect and 
reduce the power of a study. Furthermore, in such studies it has 
been shown that avoiding false positives is more important than 
avoiding false negatives. 51 Validation of myocardial infarctions 
recorded by primary care and hospital admissions against those 
recorded by the disease registry showed a positive predictive 
value of over 90%, making them suitable for detecting endpoints 
in cohort studies and trials, where poor endpoint resolution can 
dilute any observed effect and reduce the power of a study. 51 
The positive predictive value was not 100% because some 
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myocardial infarction records in primary care may actually have 
been related to unstable angina or chest pain of an unknown 
cause. 

Limitations of this study 

Our data were from a sample of 244 English general practices 
contributing to the Clinical Practice Research Datalink. 
However, the primary care patients included in this CALIBER 
study are representative of the UK population. Furthermore, 
patients in practices that participated in the linkage were 
representative of the Clinical Practice Research Datalink as a 
whole in terms of age, social deprivation, body mass index, and 
prescription of key drugs. 52 Hospital admissions with linked 
primary care data were also representative of all admissions to 
hospital in England in terms of the distribution of age, sex, and 
diagnostic group. 53 The UK life science strategy aims to increase 
the proportion of the UK population with primary care data 
available for research linked electronic health record through 
the Clinical Practice Research Datalink. 15 A second limitation 
concerns the generalisability of our findings for the quality of 
primary care data. Practices contributing data to the Clinical 
Practice Research Datalink are advised of recording guidelines 
and their data are accepted only when they meet standards of 
data completeness, 15 so they are likely to record disease events 
better than general practices that do not contribute. Our estimates 
of agreement from this study may therefore be higher than for 
practices that do not contribute to the Clinical Practice Research 
Datalink. Thirdly, our validation of Hospital Episode Statistics 
and Clinical Practice Research Datalink myocardial infarctions 
against MINAP was (inherently) limited to the subset of patients 
with a MINAP record, and caution must be exercised in 
extending these conclusions to patients with myocardial 
infarctions without a MINAP record. 

Clinical and policy implications 

With the current emphasis on measuring clinical outcomes in 
health systems and recent plans to use linked data to drive 
improvements in the care of patients with cardiovascular 
disease, 4 13 our study has important implications for practice and 
policy. Firstly, we propose much wider use of linked record 
sources in commissioning and in research to estimate disease 
occurrence and outcome, because of the biases inherent in using 
only one source of records. Changing the estimates for incidence 
of myocardial infarction could potentially alter the modelled 
effect of population based healthcare interventions. Our findings 
underscore the importance of international initiatives to 
accelerate availability of linked data in America, 1 155 in Europe, 56 
and elsewhere. Secondly, a national strategy for biomedical 
informatics is required to tackle manifest system failings: a 
single health event should, ideally, have a single record that is 
propagated in multiple record systems. Efforts to reform the 
process of death certification is already underway, 57 but this 
needs to be broadened to include other health records. Thirdly, 
and more specifically, primary care records could be improved 
if the admission date rather than discharge date was used to 
record the myocardial infarction (as reflected by the current 
"tail" of myocardial infarction records recorded up to 20 days 
after admission; see supplementary figure 4) and if acute 
myocardial infarction was recorded only for its occurrence rather 
than repeated entries for consultations related to a history of 
myocardial infarction. Fourthly, disease registries, such as 
MINAP, could be improved if embedded in real time clinical 
care of all patients with myocardial infarction, rather than the 
current situation in which hospitals employ audit staff to 
retrospectively enter records on patients in coronary care units. 



This needs to be dealt with to obtain a more complete 
understanding of the quality of care provided to patients with 
acute coronary syndromes. 

Future research 

Several lines of research are warranted by our findings. Firstly, 
research is required to understand how electronic health record 
data are coded — historically under-resourced and lacking audit 
against quality standards — and how this can be improved. 
Secondly, more extensive cross referencing is required against 
additional sources of information on myocardial infarction. 
These include self reported myocardial infarction (which may 
be less dependent on specific setting in the health system), 
manual review of all the available local case records (paper and 
electronic), and investigation of electronic free text recorded 
by general practitioners (for example, diagnoses that are not 
recorded using a Read code). Such efforts are underway in the 
UK Biobank cohort (n=500 000). 3 There is a need for 
investigator led cohorts and trials to link with the primary care 
record. 58 Although cancer registries do not record gold standard 
diagnostic criteria or cancer stage, it will be important to 
understand how linkages with primary care, admission to 
hospital, and mortality data compare. 55 This is essential for large 
studies where manual review of case records is not feasible. 
Evaluating the quality of the data available in these linked data 
sources is therefore a priority. 

Conclusion 

Failure to use linked electronic health records from primary 
care, hospital care, disease registry, and death certificates may 
lead to biased estimates of the incidence and outcome of 
myocardial infarction. 
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What is already known on this topic 

Electronic health records are increasingly used to measure outcomes of healthcare and health policy, and for research in observational 
cohorts and randomised trials 

Records from one part of the health system, such as primary care, may not capture health events occurring in other parts of the health 
system, such as hospital care 

No studies have addressed the completeness and validity of recording of myocardial infarction across four national health record sources: 
primary care, hospital care, disease registry, and death records 

What this study adds 

About one third of patients had a record of non-fatal acute myocardial infarction in all three of primary care, hospital care, and disease 
registry and two thirds in two sources 

Risk factor profiles and one year all cause mortality were comparable across myocardial infarction records from different sources 
Crude incidence of acute myocardial infarction was underestimated by 25-50% using one source compared with using all three sources 
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Tables 



Table | Recording of risk factors in primary care before myocardial infarction recorded in primary care, hospital admission, disease 
registry, and death registry sources from 1 January 2003 to 31 March 2009 


Characteristics 


Primary care: CPRD 


Hospital admissions: HES 


Disease registry: MINAP 


Cause specific mortality: 
ONS 


No of patients 


15819 


13 831 


10 351 


4017 


Median (interquartile range) age 
(years) 


73 (61-81) 


73 (61-82) 


72 (61-81) 


81 (73-87) 


Women 


5810 (36.7) 


5072 (36.7) 


3649 (35.3) 


1752 (43.6) 


Most deprived fifth* 


3211 (20.3) 


2641 (19.1) 


1997 (19.3) 


849 (21.1) 


Smoking: 


Current 


4147 (26.2) 


3608 (26.1) 


2729 (26.4) 


638 (15.9) 


Former 


9414 (59.5) 


8176 (59.1) 


6194 (59.8) 


2622 (65.3) 


None 


1933 (12.2) 


1745 (12.6) 


1341 (13.0) 


521 (13.0) 


Missing 


325 (2.1) 


302 (2.2) 


87 (0.8) 


236 (5.9) 


Mean (SD) systolic blood pressure 
(mm Hg)t 


145 (15.4) 


145 (15.6) 


145 (15.2) 


146(16.1) 


Missing 


385 (2.4) 


351 (2.5) 


198(1.9) 


64 (1.6) 


Use of blood pressure lowering drugs 


9149 (57.8) 


7907 (57.2) 


5950 (57.5) 


2919 (72.7) 


Mean (SD) total serum cholesterol 
(mmol/L)t 


5.4 (0.9) 


5.4 (0.9) 


5.4 (0.9) 


5.2 (0.9) 


Missing 


4646 (29.3) 


4291 (31.0) 


2927 (28.3) 


1101 (27.4) 


Mean (SD) HDL cholesterol (mmol/L)t 


1.3 (0.3) 


1.3 (0.3) 


1.3 (0.3) 


1.4(0.3) 


Missing 


6985 (44.2) 


6214 (44.9) 


4443 (42.9) 


1703 (42.4) 


Use of lipid lowering drugs 


5632 (35.6) 


4686 (33.9) 


3669 (35.4) 


1757 (43.7) 


Framingham hard coronary disease 
risk scored 


<10% 


1273 (8.0) 


1019 (7.4) 


851 (8.2) 


186 (4.6) 


10-20% 


4718 (29.8) 


4121 (29.8) 


3181 (30.7) 


1248 (31.1) 


>20% 


2799 (17.7) 


2439 (17.6) 


1841 (17.8) 


872 (21.7) 


Missing 


7029 (44.4) 


6252 (45.2) 


4478 (43.3) 


1711 (42.6) 


Diabetes 


2885 (18.2) 


2467 (17.8) 


1858 (17.9) 


927 (23.1) 


Mean (SD) Charlson index 


2.5(1.7) 


2.4 (1.6) 


2.4 (1.6) 


3.2 (1.9) 


Median (interquartile range) primary 
care consultation rate per year 


3.7 (1.6-7.9) 


3.5 (1.5-7.7) 


3.6 (1.6-7.8) 


5.0 (2.3-9.8) 



CPRD=Clinical Practice Research Datalink; HES=Hospital Episode Statistics; MINAP=Myocardial Ischaemia National Audit Project; ONS=Office for National 
Statistics; HDL=high density lipoprotein. 

The total number of patients was 21 482. Patients might be represented in more than one column if their myocardial infarction was recorded in more than one 
source. 

•Assessed by index of multiple deprivation. 

tMean of measurements before date of myocardial infarction. 

JBased on patients with complete data for blood pressure and cholesterol levels. 
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Table 2| Information recorded in disease registry (MINAP) within 30 days for non-fatal myocardial infarction recorded in primary care 
(Clinical Practice Research Datalink, CPRD) or hospital admissions (Hospital Episode Statistics, HES). Values are numbers (percentages) 
unless stated otherwise 



Source of myocardial infarction record 


Information in MINAP 


CPRD 


HES 


CPRD and HES 


No of patients 


7224 


7489 


6006 


Electrocardiographic findings: 


ST elevation 


3373 (46.7) 


3455 (46.1) 


3062 (51.0) 


Other abnormality 


2337 (32.4) 


2485 (33.2) 


1816 (30.2) 


Normal 


389 (5.4) 


429 (5.7) 


306 (5.1) 


Not recorded 


1125 (15.6) 


1120 (15) 


822 (13.7) 


Cardiac markers: 


Raised 


6149 (85.1) 


6358 (84.9) 


5121 (85.3) 


Normal 


368 (5.1) 


411 (5.5) 


299 (5.0) 


Missing 


707 (9.8) 


720 (9.6) 


586 (9.8) 


Peak troponin: 


Level recorded 


6109 (84.6) 


6357 (84.9) 


5029 (83.7) 


Median (interquartile range) ng/ml 


2.03 (0.47-10.0) 


2.04 (0.45-10.2) 


2.34 (0.53-11.6) 


CALIBER diagnosis*: 


ST elevation myocardial infarction 


3386 (46.9) 


3441 (46.0) 


3064 (51.0) 


Non-ST elevation myocardial infarction 


3274 (45.3) 


3410 (45.5) 


2497 (41.6) 


Unstable angina 


384 (5.3) 


425 (5.7) 


312 (5.2) 


Other 


180 (2.5) 


213 (2.8) 


133 (2.2) 


Positive predictive value (95% CI) for myocardial 
infarctiont 


92.2 (91 .6 to 92.8) 


91.5 (90.8 to 92.1) 


92.6 (91 .9 to 93.3) 


MINAP=Myocardial Ischaemia National Audit Project. 

*MINAP contains details of admissions with suspected acute coronary syndromes. The CALIBER algorithm assigns a diagnosis based on troponin, ECG findings, 
and discharge diagnosis recorded in MINAP. 

tPositive predictive value of HES or CPRD myocardial infarction is calculated considering a MINAP diagnosis of myocardial infarction as gold standard. 
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Fig 1 Crude incidence of acute fatal and non-fatal myocardial infarction estimated using different combinations of data from 
primary care (Clinical Practice Research Datalink), hospital admissions (Hospital Episode Statistics), disease registry 
(MINAP, Myocardial Ischaemia National Audit Project), and death registry (Office for National Statistics) 
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Fig 2 Kaplan Meier curves showing all cause mortality, stratified by record source in 20 819 patients: Clinical Practice 
Research Datalink (n=1 5 81 9), Hospital Episode Statistics (n=1 3 831 ), Myocardial Ischaemia National Audit Project (MINAP) 
(n=1 0 351 ). Myocardial infarctions recorded by the Office for National Statistics are not shown as they are by definition fatal 
on the date of myocardial infarction 
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Fig 3 Number and percentage of records recorded in primary care (Clinical Practice Research Datalink), hospital care 
(Hospital Episode Statistics), and disease registry (Myocardial Ischaemia National Audit Project) for non-fatal myocardial 
infarction across the three sources (n=17 964 patients) 
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